EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
نویسندگان
چکیده
Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, learning on a model ensemble, along with source domain adaptation, provides the benefit of both robustness and learning/adaptation.
منابع مشابه
Robust Fault Detection on Boiler-turbine Unit Actuators Using Dynamic Neural Networks
Due to the important role of the boiler-turbine units in industries and electricity generation, it is important to diagnose different types of faults in different parts of boiler-turbine system. Different parts of a boiler-turbine system like the sensor or actuator or plant can be affected by various types of faults. In this paper, the effects of the occurrence of faults on the actuators are in...
متن کاملReliability-Based Robust Multi-Objective Optimization of Friction Stir Welding Lap Joint AA1100 Plates
The current paper presents a robust optimum design of friction stir welding (FSW) lap joint AA1100 aluminum alloy sheets using Monte Carlo simulation, NSGA-II and neural network. First, to find the relation between the inputs and outputs a perceptron neural network model was obtained. In this way, results of thirty friction stir welding tests are used for training and testing the neural network...
متن کاملUsing Ensembles Of Neural Networks With Different Scales Of Input Data For The Analysis Of Telemetry Data
This article gives a brief description of the main methods of forming parallel ensembles of experts, in particular ensembles of neural networks. Also the learning algorithm of neural network ensembles with elements of the evolution strategy described. The problem of concept drift and methods of its solution using incremental learning ensembles of experts described. Also the method of searching ...
متن کاملPrediction of forging force and barreling behavior in isothermal hot forging of AlCuMgPb aluminum alloy using artificial neural network
In the present investigation, an artificial neural network (ANN) model is developed to predict the isothermal hot forging behavior of AlCuMgPb aluminum alloy. The inputs of the ANN are deformation temperature, frictional factor, ram velocity and displacement whereas the forging force, barreling parameter and final shape are considered as the output variable. The developed feed-forward back-prop...
متن کاملEnsemble strategies to build neural network to facilitate decision making
There are three major strategies to form neural network ensembles. The simplest one is the Cross Validation strategy in which all members are trained with the same training data. Bagging and boosting strategies pro-duce perturbed sample from training data. This paper provides an ideal model based on two important factors: activation function and number of neurons in the hidden layer and based u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1610.01283 شماره
صفحات -
تاریخ انتشار 2016